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Abstract. Linear mixed models are able to handle an extraordinary 
range of complications in regression-type analyses. Their most com- 
mon use is to account for within-subject correlation in longitudinal 
data analysis. They are also the standard vehicle for smoothing spa- 
tial count data. However, when treated in full generality, mixed models 
can also handle spline-type smoothing and closely approximate kriging. 
This allows for nonparametric regression models (e.g., additive mod- 
els and varying coefficient models) to be handled within the mixed 
model framework. The key is to allow the random effects design matrix 
to have general structure; hence our label general design. For contin- 
uous response data, particularly when Gaussianity of the response is 
reasonably assumed, computation is now quite mature and supported 
by the R, SAS and S-PLUS packages. Such is not the case for binary 
and count responses, where generalized linear mixed models (GLMMs) 
are required, but are hindered by the presence of intractable multi- 
variate integrals. Software known to us supports special cases of the 
GLMM (e.g., PR0C NLMIXED in SAS or glmmML in R) or relies on the 
sometimes crude Laplace-type approximation of integrals (e.g., the SAS 
macro glimmix or glmmPQL in R). This paper describes the fitting of 
general design generalized linear mixed models. A Bayesian approach is 
taken and Markov chain Monte Carlo (MCMC) is used for estimation 
and inference. In this generalized setting, MCMC requires sampling 
from nonstandard distributions. In this article, we demonstrate that 
the MCMC package WinBUGS facilitates sound fitting of general design 
Bayesian generalized linear mixed models in practice. 
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1. INTRODUCTION 

The generalized linear mixed model (GLMM) is 
one of the most useful structures in modern statis- 
tics, allowing many complications to be handled within 
the familiar linear model framework. The fitting of 
such models has been the subject of a great deal of 
research over the past decade. Early contributions 
to fitting various forms of the GLMM include Sti- 
ratelli, Laird and Ware (1984), Anderson and Aitkin 
(1985), Gilmour, Anderson and Rae (1985), Schall 
(1991), Breslow and Clayton (1993) and Wolfinger 
and O'Connell (1993). A summary is provided by 
McCulloch and Searle (2001, Chapter 10). 

Most of the literature on fitting GLMMs is geared 
toward grouped data. Examples include repeated bi- 
nary responses on a set of subjects and standardized 
mortality ratios in geographical subregions. How- 
ever, GLMMs are much richer than the subclass 
needed for these situations. The key to full gener- 
ality is the use of general design matrices, for both 
the fixed and random components. Once again, we 
refer to McCulloch and Searle (2001, Chapter 8) for 
an overview of general design GLMMs. An excel- 
lent synopsis of general design linear mixed mod- 
els is provided by Robinson (1991) and the ensuing 
discussion. One of the biggest payoffs from the gen- 
eral design framework is the incorporation of non- 
parametric regression, or smoothing, through penal- 
ized regression splines (e.g., Wahba, 1990; Speed, 
1991; Verbyla, 1994; Brumback, Ruppert and Wand, 
1999). Higher dimensional extensions essentially cor- 
respond to generalized kriging (Diggle, Tawn and 
Moyeed, 1998). This allows for smoothing-type mod- 
els such as generalized additive models to be fitted 
GLMM and combined with the more tradi- 
tional grouped data uses. This is the main thrust 
of the recent book by Ruppert, Wand and Carroll 
(2003), a summary of which is provided by Wand 
(2003). General designs also permit the handling of 
crossed random effects (e.g., Shun, 1997) and mul- 
tilevel models (e.g., Goldstein, 1995; Kreft and de 
Leeuw, 1998). 

The simplest method for fitting general design 
GLMMs involves Laplace approximation of integrals 
(Breslow and Clayton, 1993; Wolfinger and O'Connell, 
1993) and is commonly referred to as penalized quasi- 
likelihood (PQL). However, the approximation can 
be quite inaccurate in certain circumstances. Bres- 
low and Lin (1995) and Lin and Breslow (1996) 



showed that PQL leads to estimators that are asymp- 
totically biased. For situations such as paired bi- 
nary data the PQL approximation is particularly 
poor. In their summary of PQL, McCulloch and 
Searle (2001, Chapter 10, pages 283-284) concluded 
by stating that they "cannot recommend the use 
of simple PQL methods in practice." In this arti- 
cle we take a Bayesian approach and explore the 
Markov chain Monte Carlo (MCMC) fitting of gen- 
eral design GLMMs. One advantage of a Bayesian 
approach over its frequentist counterpart includes 
the fact that uncertainty in variance components 
is more easily taken into account (e.g., Handcock 
and Stein, 1993; Diggle, Tawn and Moyeed, 1998). 
As summarized in Section 9.6 of McCulloch and 
Searle (2001), the frequentist approach to this prob- 
lem is thwarted by largely intractable distribution 
theory. Under a Bayesian approach, posterior dis- 
tributions of parameters of interest take this vari- 
ability into account. The hierarchical structure of 
the Bayesian GLMMs lends itself to Gibbs sampling 
schemes, albeit with some nonconjugate full condi- 
tionals, to sample from these posteriors. In addi- 
tion, it is computationally simpler to obtain variance 
estimates of the predictions of the random effects. 
Booth and Hobert (1998) showed that, in a frequen- 
tist framework, second-order estimation of the con- 
ditional standard error of prediction for the random 
effects requires bootstrapping the maximum likeli- 
hood estimates of the fixed effects and variance com- 
ponents. For complicated random effects structures, 
computation of a single maximum likelihood fit can 
be expensive, making the bootstrap computation- 
ally prohibitive. In the Bayesian framework, inter- 
est focuses on the posterior variance of the random 
effects given the data, which is a by-product of the 
MCMC output. Note, however, that the Bayesian 
approach involves specification of prior distributions 
of all model parameters. This requires some care, es- 
pecially when sample sizes are small. 

There have been a few other contributions to Bayesi- 
an formulations of GLMMs in the literature. Those 
known to us are Zeger and Karim (1991), Clayton 
(1996), Diggle, Tawn and Moyeed (1998) and Fahrmeir 
and Lang (2001). However, each of these articles is 
geared toward special cases of GLMMs. The GLMMs 
described in this article are much more general and 
allow for random effects models for longitudinal data, 
crossed random effects, smoothing of spatial count 
data, generalized additive models, generalized geo- 
statistical models, additive models with interactions, 
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varying coefficient models and various combinations 
of these (Wand, 2003). 

Section 2 lays out notation for general design 
GLMMs and gives several important examples. 
MCMC implementation is described in Section 3, 
with a focus on the WinBUGS package. Section 4 pro- 
vides three illustratory data analyses. We close with 
some discussion in Section 5. 

2. MODEL FORMULATION 

The GLMMs for canonical one-parameter expo- 
nential families (e.g., Poisson, logistic) and Gaussian 
random effects take the general form 



(1) 



[y|/3,u]=exp{y T (X/3 + Zu) 



l T 6(X/3 + Zu) + l T c(y)}, 



(2) [u|G]~JV(0,G), 

where here and throughout the distribution of a ran- 
dom vector x is denoted by [x] and the conditional 
distribution of y given x is denoted by [y|x]. 

In the Poisson case b(x) = e x , while in the logis- 
tic case b(x) = log(l + e x ). A few other models (e.g., 
gamma, inverse Gaussian) also fit into this structure 
(McCullagh and Nelder, 1989). A number of exten- 
sions and modifications are possible. One is to allow 
for overdispersion, especially in the Poisson case. In 
this paper we will restrict attention to the canoni- 
cal one-parameter exponential family structure. In 
most situations, the main parameters of interest are 
contained in (3 and G, and prior distributions for 
them need to be specified; see Sections 2.1 and 2.2. 

It is important to separate out random effects 
structure for handling grouping. One reason is that 
this allows for the possibility of hierarchical center- 
ing in the MCMC implementations (Section 2.3). 
It also recognizes the different covariance structures 
used in longitudinal data modeling, smoothing and 
spatial statistics. Such considerations suggest the 
breakdown 



(3) 
where 



X(3 + Zu = X R /3 R + Z R u R 

+ X G /3 G + Z G u G + Z G u G , 



X R 



Z R = blockdiag(Xf ) 

Ki<m 



and 



Cov(u R ) = blockdiag(£ fl ) = I m ® £ R 

Ki<m 



correspond to random intercepts and slopes, as typ- 
ically used for repeated measures data on m groups 
with sample sizes ni, . . . , n m . Here X^ is an rii x q R 
matrix for the random design corresponding to the 
ith group, 5] R is an unstructured q R x q R covariance 
matrix and <8> denotes Kronecker product. 

Next, X G and Z G are general design matrices, 
usually of different form than those arising in ran- 
dom effects models. In many of our examples, X G 
contains indicator variables or polynomial basis func- 
tions of a continuous predictor, while Z G contains 
spline basis functions (e.g., Brumback, Ruppert and 
Wand, 1999). The Z G u G term may be further de- 
composed as 



Z G u G 



XXu G 

i=i 



with each Z G , 1 < i < L, usually corresponding to a 
smooth term in an additive model. Also, in keeping 
with spline penalization, we only consider 

Cov(u G ) = blockdiag(cr^I). 

i<e<L 

Note that the decomposition (3) is not unique for 
a particular model. For instance, in the crossed ran- 
dom effects model given in the following Example 3, 
we present two methods of decomposition. 

The Z G u G component represents random effects 
with spatial correlation structure. This can be done 
in a number of ways (e.g., Wakefield, Best and Waller, 
2001); we will just describe one of the more common 
approaches here. Suppose disease incidence data are 
available over N contiguous regions. The random ef- 
fect u G vector is of dimension N with entries £7 G , 
...,Uft. The conditional distribution of Iff given 
Uj, j ^i, is a univariate normal distribution with 
mean equal to the average U G values of ?7 G 's neigh- 
boring regions and variance equal to a\ divided by 
the number of neighboring regions. This is known 
as the intrinsic Gaussian autoregression distribution 
(Besag, York and Mollie, 1991). This leads to u G 
having an improper density proportional to 



(4) 



exp < 



El 



(vf-uy) 



C\2 



where denotes spatially adjacent regions. 

The versatility of (3) can be appreciated by con- 
sidering the following set of examples. Note that we 
use truncated linear basis functions for smoothing 
components to keep the formulations simple (e.g., 
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Brumback, Ruppert and Wand, 1999). In practice 
these may be replaced by B-splines (Durban and 
Currie, 2003) or radial basis functions (French, Kam- 
mann and Wand, 2001). Knots are denoted by K k 
with possible superscripting. Ruppert (2002) dis- 
cussed choice of knots of univariate smoothings, whereas 
Nychka and Saltzman (1998) described the choice of 
knots for multivariate smoothing and kriging. In the 
examples we use 1^ to denote a d x 1 vector of ones. 

Example 1. Random intercept: 
(X/3 + Zu)y = A) + Ui + foxy, 

1 < j < Tii, 1 < i < m, 
Xf = l n4 , X G - 
Z G = Z C = 0, 
Example 2. Random intercept and slope: 
(X/3 + Zu) y = A, + C7i + (Ai + Fijoiy, 

1 < _7 < j , 1 < z < m, 

1 " 



x R = z R = z c 

G 



np|Im 



2i =(T„. 



X 



1 ^-7 



X 



G 



rG 



0, 



2 



Example 3. Crossed random effects model: 
(X/3 + Zu) ii , =(3 + Ui + U!,, 

l<i<n, l<i'<n', 



G 



X 

.G 



rG 



nn' ; 



[I n <g) l n /|l„ (8)I„'] 5 



Z° =0, 
u~ = [[/!,..., C/ re , C/ l5 ..., C/;,] T , 
Cov(u G ) = blockdiag(cr^I n , <r^/I n /). 
An alternative representation of this model is 



y-R — i 



fG 



G 



rG 



0, 



Cov(u 



G\ 



a u ,l n '. 



This allows for implementation of hierarchical cen- 
tering as described in Section 2.3. 

Example 4. Nested random effects model: 

(X/3 + Zu)ijfc = /3 + C/j + T^-(j) + /3i Xjjfe, 

1 < i < m, 1 < j < n, 1 < A; < p, 

X G ' = [1 aiijfc]i<i< m ,l<3<Ti,l<fc<p) 



(In® lp)], 

U" = [C/i,...,^,^!),..., 

K(l)> • • • i ^l(m)> • • • i Ki(m)] T > 

Cov(u G ) = blockdiag(o^I m , o^I np ). 
Example 5. Generalized scatterplot smoothing: 

K 

(X/3 + Zu)i = /3 + /Siscj + ^ u k {xi - K k ) + , 

k=i 



X 



G 



[1 X 



i Ki<n ; 



rG 



(Xi - «Jfc). 
Kfc<A 



X' 



rR 



rC 



l<i<n 
0. 



Cov(u G ) = a\\ K . 
Example 6. Generalized additive model: 



(X/3 + Zu)i = A) + PsSi + J2 u k( s i - 4 



X 



G 



rG 



fc=l 

if* 

+/%+^4(*i-4)+» 

fe=l 

[1 ii]l<i<nj 

(Sj — rv|) + (tj — K k )_ 
Kk<K s l<k<K* 



Ki<n 



X 



R 



rR 



rC 



0, 



Cov(u ) =blockdiag(cr us I^ s ,a, tt I ft : t ). 

Example 7. Generalized additive semiparamet- 
ric mixed model: 



(X/3 + Zu 



/So + C/i + (A? + VSjfcj 
+ (/3 r + W i )r ii + /3ix ij 



x 



x ( 



fc=l 

A'* 

+A% + 5Z4(*ii-4)+ 5 

k=l 

' 1 ?ii r 41 
. 1 Qim Titu 

Sij t{j Xij ] l<j<nj,l<i<m) 
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tG 



Kk<K 3 



x 'kJ + 



l<k<K* 

Z C = 0, 

~S R = unstructured 3x3 covariance matrix. 

Cov(u G ) = blockdiag(c7^ s Iif S ,cr^I^t). 

Example 8. Generalized bivariate smoothing/ 
low-rank kriging: 

K 

(X/3 + Zu)i = A) + /37xi + £ u fe C(|| Xi - Kfc||), 



X 



G 



rG 



X 

Cov(u G ) = <#. 





A: 


[1 


x[]l<i<n 


H 


|| x i — K fc| 




v l<k<K 




= Z c = 0, 







Ki<n 



Here ||v|| = \/v T v, C(r) =r 2 log|r| corresponds to 
low-rank thin plate splines with smoothness parame- 
ter set to 2 (as defined in Wahba, 1990) and C{r) = 
exp(—\r / p\)(l + \r/p\) corresponds to Matern low- 
rank kriging with range p > and smoothness pa- 
rameter set to 3/2 (as defined in Stein, 1999; Kam- 
mann and Wand, 2003). Several more examples could 
be added, including some where Z c ^ 0. 

2.1 Fixed Effects Priors 

Throughout we take the prior distribution of the 
fixed effects vector (5 to be of the form 

D9]~JV(0,F) 

for some covariance matrix F. In practice it is com- 
mon to take F to be diagonal with very large en- 
tries, corresponding to noninformative priors on the 
entries of f3. Such a strategy ensures that, with ap- 
propriate choice of prior for the variance compo- 
nents, the resulting joint posterior distribution of 
the parameters will be proper. Even so, the result- 
ing posterior distributions approximate those based 
on uniform priors for j3. For normal models, Gel- 
man (2005) noted that because we typically have 
enough data to estimate these coefficients from the 
data, any noninformative prior is adequate. For bi- 
nary response models, Natarajan and Kass (2000) 
showed that, under mild regularity conditions that 
usually amount to soft requirements on the number 
of successes and failures in the data set, use of a 
uniform distribution for (3 in conjunction with an 



appropriate prior for the variance components re- 
sults in a proper posterior. For logistic regression, 
Bedrick, Christensen and Johnson (1997) noted that 
the normal prior for j3 is convenient in large sample 
situations in which the posterior is approximately 
normal. In other situations, one should be cautious 
about using a normal prior with large covariances, 
because the induced prior distributions for each P(y = 
1) can have point masses at zero and one. In such 
cases, it may be preferable to use the conditional 
means priors proposed by Bedrick, Christensen and 
Johnson (1996), which specify prior distributions on 
the success probabilities directly. 

2.2 Covariance Matrix Priors 

Over the last decade and-a-half, prior elicitation 
for the variance components in Bayesian GLMMs 
has been an active area of statistical research. Sev- 
eral authors have demonstrated that the use of im- 
proper priors for these parameters can lead to im- 
proper posteriors, with Gibbs samplers unable to de- 
tect such ill-conditioning (Hobert and Casella, 1996). 
As a result, a popular choice is the use of proper 
but "diffuse" conditionally conjugate priors. In the 
GLMM setting with normal random effects, this cor- 
responds to an inverse gamma (IG) distribution for 
a single variance component and an inverse Wishart 
distribution for a variance-covariance matrix. For 
hierarchical versions of GLMMs, however, recent re- 
search has shown that these priors can actually be 
quite informative, leading to inferences that are sen- 
sitive to choice of the hyperparameters for these dis- 
tributions (Natarajan and McCulloch, 1998; Natara- 
jan and Kass, 2000; Gelman, 2005). Natarajan and 
Kass (2000) and Gelman (2005) have proposed alter- 
native prior elicitation strategies that improve upon 
the conditionally conjugate priors. In Section 4 we 
outline a sensitivity analysis approach that takes 
these latest proposals into account. 

2.3 Hierarchical Centering 

Hierarchical centering of parameters can improve 
convergence of Markov chain Monte Carlo schemes 
(Section 3) for fitting Bayesian mixed models (e.g., 
Gelfand, Sahu and Carlin, 1995). In the context of 
this section, hierarchical centering involves repara- 
metrization of (f3 R ,u R ) to ((3 R ,~f), where 

7 = {(Z R ) T Z R y 1 (Z R ) T X R p R + u R . 



The new vector of parameters 7 can be further di- 
vided into m subvectors 7^ with y i = (5 



R + uf , so 
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7 



7i 



7, 



Then the general design generalized linear mixed 
model becomes 

L 

X/3 + Zu = Z R j + X. G (3 G + z ? u ? + Z'V 7 . 

Note that hierarchical centering is not a well-defined 
concept for general design or spatial structures, 
because u G and u c cannot be centered in a hier- 
archical way similarly to that for u R . As a result, 
the general design and spatial structures do not con- 
tribute to the model for the mean in a conditionally 
hierarchical manner. 

2.4 Applications 

This section describes three public health appli- 
cations that benefit from general design Bayesian 
GLMM analysis. The analyses are postponed to Sec- 
tion 4. 

2.4.1 Respiratory infection in Indonesian children. 
Our first example involves longitudinal measurements 
on 275 Indonesian children. Analyses of these data 
have appeared previously in the literature (e.g., Dig- 
gle, Liang and Zeger, 1994; Lin and Carroll, 2001), 
so our description of them will be brief. The response 
variable is binary: the indicator of respiratory infec- 
tion. The covariate of most interest is the indica- 
tor of vitamin A deficiency. However, the age of the 
child has been seen in previous analyses to have a 
nonlinear effect. 

A plausible model for these data is the Bayesian 
logistic additive mixed model 



(5) 



logit{P(respiratory infection^- = 1)} 

= ft + tfi + /3 T x y + /(age if ) ) 



where 1 < i < 275 indexes child and 1 < j < rii in- 
dexes the repeated measures within child. Here U x ~ 
N(0,afj) is a random child effect, Xjj denote mea- 
surements on a vector of nine covariates — height and 
indicators for vitamin A deficiency, sex, stunting 
and visit number — and / is modeled using penalized 
splines with spline basis coefficients A r (0, a^). 



2.4.2 Caregiver stress and respiratory health. The 
Home Allergen and Asthma study is an ongoing lon- 
gitudinal study that is investigating risk factors for 
incidence of childhood respiratory problems includ- 
ing asthma, allergy and wheeze (Gold et al., 1999). 
The portion of the study data that we will consider 
consists of 483 families who were followed for two 
and-a-half years after the birth of a child. At the 
start of the study, a number of demographic vari- 
ables were measured on each family, including race, 
categorized household income, categorized caregiver 
educational level and child's gender. Additionally, 
one of the hypothesized risk factors for childhood 
respiratory problems is exposure to a stressful en- 
vironment (Wright et al., 2004). Each child's envi- 
ronmental stress level was measured approximately 
bimonthly by a telephone interview and assessed on 
a discrete ordinal scale from (no stress) to 16 (very 
high stress) . This assessment was based on the four- 
item Perceived Stress Scale (PSS-4) (Cohen, 1988). 

Let 1 < i < 483 index family and let 1 < j < ni in- 
dex the repeated measurements within each family. 
We arrived at the following Bayesian Poisson addi- 
tive mixed model for stress experience by caregiver 
i when the child was age,^: 



(6) 



stress 



i.i 



Poisson [exp{/? + U + (3 T Xij + /(age^ •)}] 



ind. 



The random intercept, U *~* N(0, o~fj), is a random 
family effect, and Xj includes indicators of annual 
family income and race (see Figure 4 for details). 
The term /(age^ ) is a nonparametric term that we 
model using penalized splines with spline coefficients 

Ufc ~' A r (0, ct„). We include the nonparametric term 
in the model for the effect of stress as a function of 
child's age because, outside of anecdotal evidence, 
we do not know of a biologically motivated para- 
metric model for stress as a function of child's age. 
We arrived at the other terms in the model (and 
removed other demographic terms and interactions 
from the model) based on discussions with the inves- 
tigators in the study and exploratory data analyses 
that we fitted via maximum PQL. 

2.4.3 Standardized cancer incidence and proxim- 
ity to a pollution source. Elevated cancer rates were 
observed in a region of Massachusetts, USA, known 
as Upper Cape Cod, during the mid-1980s, and one 
risk factor of interest is a fuel dump at the Mas- 
sachusetts Military Reservation (MMR) (Kammann 
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and Wand, 2003; French and Wand, 2004). For nearly 
20 years the Massachusetts Department of Public 
Health (MDPH) has maintained a cancer registry 
data base which records incident cases for 22 types 
of cancers, including lung, breast and prostate can- 
cers. In this example we focus on female lung cancer 
between 1986 and 1994. 

We use a semiparametric Poisson spatial model 
to investigate the relationship between census tract 
level female lung cancer standardized incidence rates 
(SIRs) and distance to the MMR. Let i = 1, ... ,45 
represent the census tracts in the study, and let 
observed; and expectedj be the observed and ex- 
pected number of incident cases of female lung can- 
cer in tract i (i.e., numerator and denominator of the 
SIR), respectively. After fitting a number of models 
that included terms for additional demographic fac- 
tors and water source, we arrived at the semipara- 
metric Poisson spatial model 

observed^ 

(7) ~ Poisson[expectedj exp{/?o + Uf + j3\Xi 

+ /(dist i )}], 

where Xi is the percentage of women in tract i who 
were over 15 and employed outside the home in 1989, 
and distj is the distance from the centroid of cen- 
sus tract i to the centroid of the MMR. Here, u c = 
(Ui , . . . , U^ 5 ) T is a vector of spatially correlated ran- 
dom effects with intrinsic Gaussian autoregression 
distribution parametrized by variance component a^, 
as defined in (4). To complete the specification of 
the spatial correlation model, we choose a cutoff dis- 
tance value d and treat two census tracts as neigh- 
bors if the distance between their centroids is less 
than or equal to d. We choose d = 7.5 km, which cor- 
responds to the cutoff such that every census tract 
has at least one neighbor. We model the nonpara- 
metric term /(distj) using penalized splines with 

coefficients '~' N(0,a^). 

3. FITTING VIA MARKOV CHAIN MONTE 
CARLO 

In the general design GLMM (1) and (2), the pos- 
terior distribution of 

v T = [(3 T u T ] 

is 

[u\y] = ( f exp{y T CV - 1 T 6(CV) 



(8) -iaoglGI + iAv-^HG^Gj 

• (J J exp{y T Ci/- l T b{Cu) 

-i(log|G|+^ T V-V)} 

■[G]dGdu^j , 

where C = [ X Z ], V = blockdiag(F, G) and [G] is 
the prior on the variance components in G. These 
integrals are analytically intractable for most prob- 
lems. Furthermore, in the applications we consider, 
the dimensionality precludes the use of numerical in- 
tegration. A standard remedy is to apply a Markov 
chain Monte Carlo algorithm to draw samples from (8). 
An overview of MCMC is provided by Gilks, Richard- 
son and Spiegelhalter (1996). 

The MCMC methods break up the model param- 
eters into subsets and then sample from the condi- 
tional distributions given the remaining parameters 
and data, which are often called full conditionals. In 
the general design GLMM, the natural breakdown 
of the parameters is into v and G, leading to the 
full conditionals 

MG,y] and [G\u,y]. 

The latter full conditional has a standard form when 
the prior on the variance components is inverse gamma 
or Wishart, which are "conditionally conjugate" pri- 
ors for this model, but not when, say, a folded- 
Cauchy prior is used (e.g., Gelman, 2005). The first 
full conditional has the general form 

[i>|G,y] ocexp{y T Ci/- l T b(Cv) - ±i/ t V _1 i/}, 

which is a nonstandard distribution unless y is con- 
ditionally Gaussian. Clever strategies such as adap- 
tive rejection sampling (Gilks and Wild, 1992) and 
slice sampling (e.g., Besag and Green, 1993; Neal, 
2003) are required to draw samples. The most com- 
mon versions of these algorithms work with the full 
conditionals of the components v. When V is diag- 
onal, these full conditionals are of the form 

[i/fc|i/_jfc, G,y] 

(9) oc exp{(C T y) fc ^ fc - l T 6(c fc ^ + C_ fe ^_ fe ) 

- H/( V W- 

Here c& is the kth. column of C, is C with the 
kth column omitted, is the kth entry of v and 
V-k is v with the /cth entry omitted. It is easily 
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shown that (9) is log-concave, which permits use of 
adaptive rejection sampling and simplifies slice sam- 
pling. These algorithms can also be used to sample 
from the full conditionals for the variance compo- 
nents when necessary. 

Zhao (2003) provides a detailed account of MCMC 
for general design GLMM and compares several strate- 
gies via simulation. One of the conclusions drawn 
from the simulations is that the WinBUGS package 
(Spiegelhalter, Thomas and Best, 2000) performs 
excellently among various "off-the-shelf" competi- 
tors. This is very good news because it saves the 
user having to write his or her own MCMC code. 
However, it should be noted that for large models 
WinBUGS can take quite some time to obtain a fit. 
Also, the analysis must be performed on a particu- 
lar platform (Windows). Assuming that computation 
time is not an issue and that Windows is available, we 
can report that fitting of general design GLMMs via 
WinBUGS has a large chance of success. For our anal- 
yses we had access to several personal computers 
and ran multiple chains in parallel to assess conver- 
gence and prior sensitivity. This reduced the elapsed 
time it took to compute each one of the analyses by 
an order of magnitude. 

4. DATA ANALYSES 

4.1 Input Values and Prior Distributions 

We used WinBUGS to fit the models described in 
Section 2.4. However, several input values and prior 
distributions needed to be specified, so we preface 
the analyses with the particular choices that were 
made. A more detailed study on the use of WinBUGS 



putting f(x) 



(10) 



■ /?o + 0i x + Z x u, where 

-1/2 







1 |3 






\ K k ~ K k' \ 


. l<k<K - 




- l<k,k'<K - 



and 



u~N(0,a 2 u I) 



(French, Kammann and Wand, 2001), with Kk = 
(7Tf2")t,h quantile of the unique predictor values. In 
general, K can be chosen using rules such as 

K = min( j (number of unique predictor values), 35) 

or those given in Ruppert (2002). However, often 
considerably smaller K can be used through experi- 
mentation with the benefit of faster MCMC fitting. 
This approach was taken in our analyses. 

We considered several common variance compo- 
nent priors. These were inverse gamma with equal 
scale and shape, 



\a 2 ] oc (a 2 ) 



-(a+l) e -a/a 2 



denoted by IG(o,a) for a = 0.001,0.01,0.1 
(Spiegelhalter et al., 2003), and the folded-i class 
of priors for a (Gelman, 2005) 

/ 1 fa\ 2 \^ v+l ^ 2 
[a]oc(l + -(- 



where s and v are fixed scale and degrees of freedom 
hyperparameters, respectively. We investigated the 
sensitivity of the model fit to the choice of the hy- 
perparameters. Results showed that fits based on 
the IG priors were stable for a > 0.01, but those ob- 
tained assuming a = 0.001 behaved erratically. Out 
of the remaining choices, the folded-Cauchy prior, 
a member of the folded-f class of priors, performed 
well. 

for fitting models of this type is provided by Crainiceanu, As a result of these empirical comparisons, in our 



Ruppert and Wand (2005). 

Based on the recommendations of Gelfand, Sahu 
and Carlin (1995), we used hierarchical centering 
of random effects. All continuous covariates were 
standardized to have zero mean and unit standard 
deviation. A strategy such as this is necessary for 
the method to be scale invariant given fixed choices 
for the hyperparameters. We used radial cubic ba- 
sis functions for smooth function components. Apart 
from making the fitted functions smooth and requir- 
ing a relatively small number of knots, Crainiceanu, 
Ruppert and Wand (2005) reported that they had 
good mixing properties in MCMC analysis. Radial 
cubic basis function modeling of a function / entails 



examples we take the approach of fitting models un- 
der multiple prior distributions for the variance com- 
ponents and assessing the sensitivity of the results 
to these assumptions. Due to its popularity, we fit 
general design GLMMs using independent IG(0.01, 
0.01) priors for each variance component. Results 
suggest this prior performs well for the examples 
considered in this paper. We also refitted the models 
using independent folded-Cauchy prior distributions 
for each variance component. For a variance compo- 
nent square root a and fixed scale parameter s, the 
folded-Cauchy distribution has probability distribu- 
tion 

[a]o,(a 2 + s 2 )- 1 . 



GENERAL DESIGN 

Following Gelman (2005), we take s = 25 in our ex- 
amples and check the sensitivity of results to this 
choice by also fitting the models for s = 12. This 
prior can be implemented in WinBUGS using the flex- 
ible feature that allows a user to code an arbitrary 
prior distribution for the model parameters (see the 
Appendix). We also ran the models using a Uni- 
form(0, 100) prior on a. A theoretical comparison 
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of such priors in the general design GLMM setting 
is a topic worthy of future research. 

Table 1 summarizes the input values and prior 
distributions that were used. 

4.2 Respiratory Infection in Indonesian Children 

Using the prior distributions and input values given 
in Table 1, WinBUGS produced the output for the (3 
coefficients summarized in Figure 1. It is seen that, 
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Fig. 1. Summary 
trace plot of sample 
Gelman-Rubin \[~R 



of WinBUGS output for parametric components of (5). The full titles for columns are name of variable, 
of corresponding coefficient, plot of sample against 1-lagged sample, sample autocorrelation function, 
diagnostic, kernel estimate of posterior density and basic numerical summaries. 
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Table 1 

Input values and prior distributions used in WinBUGS for 
analyses 



Hierarchical centering used for random intercepts. 

Continuous covariates standardized to have 

zero mean and unit standard deviation. 

Radial cubic basis functions for smooth functions. 

length of burn-in 5000 
length of "kept" chain 5000 
thinning factor 5 
prior for fixed effects N(0, 10 s ) 

(IG(0.01, 0.01) 

prior for variance components < folded-Cauchy with s = 12,25 

I Uniform(0, 100) 



first quart! te of age 
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Fig. 2. Summary of WinBUGS output for estimate of f(age). The top left panel is the posterior mean of the estimated 
probability of respiratory infection with all other covariates set to their average values. The shaded region is a corresponding 
pointwise 95% credible set. The remaining panels are trace plots of samples used to produce the top left plot at quartiles of the 
age data. 



GENERAL DESIGN BAYESIAN GLMM 



11 



for this model, the chains mix quite well with little 
significant autocorrelation and Gelman-Rubin 
values (Gelman and Rubin, f992) all less than f .Of . 
Vitamin A deficiency is seen to have a borderline 
positive effect on respiratory infection, which is in 
keeping with previous analyses. Similar comments 
apply to sex and some of the visit numbers. 

The estimated effect of age is summarized in the 
top left panel of Figure 2 and is seen to be signifi- 
cant and nonlinear. The remaining panels show good 
mixing of the chains corresponding to the estimated 
age effect at quartiles of the age data. Gelman- 
Rubin \f~R plots (not shown here) support conver- 
gence of these chains. 

To assess the sensitivity of our conclusions to the 
choice of variance component priors, we also ran 
the Gibbs samplers assuming the folded-Cauchy and 
Uniform priors for the random effects standard de- 
viations (see Section 2.2). Figure 3 shows the pos- 



terior estimates and 95% credible intervals for the 
regression coefficients of interest using the default 
independent IG priors, independent folded-Cauchy 
priors with s = 25, independent folded-Cauchy pri- 
ors with s = 12 and independent [7(0, f 00) priors. 
This figure shows that results are not sensitive to 
this choice, with the changes in the posterior means 
never more than 2% of that obtained from the IG 
specification and the credible intervals never more 
than 6.5% wider than their IG counterparts. 

4.3 Caregiver Stress and Respiratory Health 

For this example, we also used the priors and input 
values given in Table 1 and we provide the WinBUGS 
code in the Appendix. For the spline we used 12 
knots that were spaced evenly on the percentiles of 
age. We found that the fit did not change noticeably 
if we used more knots and we chose a small number 
of knots for computational efficiency. 
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Fig. 3. Results of sensitivity analysis for variance component priors for model (5). 
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Fig. 4. Summary of WinBUGS output for parametric components of (6). The full titles of columns are name of variable, 
trace plot of sample of corresponding coefficient, plot of sample against 1-lagged sample, sample autocorrelation function, 

Gelman-Rubin \/~R diagnostic, kernel estimate of posterior density and basic numerical summaries. The coefficients can be 
interpreted as time invariant offsets to the time varying population mean. 



Figure 4 shows the Bayes estimates and credible 
intervals for the [3 coefficients as well as an assess- 
ment of the convergence of the chains. The coeffi- 
cients can be interpreted as category-specific offsets 
from the population mean. The chains had a mod- 
erate autocorrelation and the Gelman-Rubin 
values were all less than 1.04. Figure 5 contains the 
estimated age effect and trace plots for the effect of 



age at the quartiles of the data. Again, the Gelman- 
Rubin values were less than 1.04 and support 
convergence. The figures are based on the chain that 
used the independent inverse gamma priors for the 
variance components. Fits that used independent 
Cauchy (s = 25) priors for the square root of the 
variance components changed neither the posterior 
means nor the widths of the credible intervals for 
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FlG. 5. Summary of WinBUGS output for estimate of f(age). 
(PSS-i) as a function of age with all other covariates set to 
pointwise 95% credible set. The remaining panels are trace plots 
age data. 

the parameters of interest by more than 4.7%. The 
posterior mean and confidence set for /(age) was 
also relatively insensitive to the prior on the vari- 
ance components in this example. 

Two aspects of the fit that were of interest to the 
investigators in the study included the inverse dose 
response relationship between income and stress, and 
that race was significantly related to environmental 
stress even after accounting for the effect of income. 
The nonparametric estimate of stress as a function 
of the child's age was also interesting and suggests 
that relatively stressful times include the first few 
months, when the child is approximately a year old, 
and beyond age two. 



first quart I e of age 
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index 
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The top left panel is the posterior mean of the mean stress 
their average values. The shaded region is a corresponding 
of samples used to produce the top left plot at quartiles of the 



4.4 Standardized Cancer Incidence and 
Proximity to a Pollution Source 

As in the previous examples, we started with the 
prior distributions and inputs in Table 1. In this 
case though, the chain required a longer burn in. 
We found that a burn in of length 15,000 was suf- 
ficient to produce acceptable convergence. Figure 6 
(bottom panel) contains the resulting convergence 
diagnostics and inferences for the parameters in the 
model. The middle panel of Figure 6 contains an es- 
timate of the contribution of distance to the MMR 
to the standardized incidence and trace plots of the 
function estimate at the quartiles of distance. The 
Gelman-Rubin values for the estimates at these 
quartiles were less than 1.04 and support conver- 
gence. Finally, the top panel of Figure 6 maps the 
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Fig. 6. Summary of WinBUGS output for the fit of (7). The top panel contains a spatial plot of the smoothed SIRs ( posterior 
means of the (7f 's). The middle panel shows the estimated f(dist) and a corresponding pointwise 95% credible set along with 
trace plots of the samples of the function at the quartiles of distance. The bottom panel displays summaries of other parameters 

of interest and convergence diagnostics. Additionally, the Gelman-Rubin \/~R diagnostics were less than 1.03 for all the 11" s 
and for the f(dist) at the quartiles of distance. The MMR is the area in the center of the map that is excluded from the 
analysis. 
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estimated SIRs based on the model fit, demonstrat- 
ing the smoothing achieved by the spatial model. 
The figures are based on fits that used independent 
IG(0.01, 0.01) priors for the variance components. 
Fits that used independent Cauchy (s = 25) priors 
for the square root of the variance components de- 
creased the length of the credible interval for the 
effect of percent working by 6.7% and lowered the 
posterior mean by 3.1%. The posterior mean and 
confidence set for /(distj) also changed very little. 

The fitted model suggests a nominally positive re- 
lationship between the percentage of women who 
were working outside the home in 1989 and stan- 
dardized lung cancer incidence rates at the census 
tract level. Further, the estimated curve /(distj) 
suggests an increased standardized incidence rate 
for census tracts that are closer than about 10 km 
to the MMR after controlling for other factors, and 
the map suggests that areas immediately east of the 
MMR exhibit the highest SIRs. None of the esti- 
mated effects of the model covariates is strongly sig- 
nificant. Regardless of statistical significance, how- 
ever, we emphasize that this type of "cancer clus- 
ter" study should be viewed as exploratory since the 
study design is ecological (e.g., Kelsey, Whittemore, 
Evans and Thompson, 1996, Chapter 10). Addition- 
ally, reanalyses of similar studies have demonstrated 
that unmeasured confounders could radically change 
the conclusions in these types of analyses (e.g., Ah- 
erns et al., 2001). 

5. DISCUSSION 

As illustrated by the analyses in the previous sec- 
tion, general design Bayesian GLMMs are a very 
useful structure. In this article we have demonstrated 
that WinBUGS provides good off-the-shelf MCMC fit- 
ting of these models. Some of the reviewers have 
pointed out the possibility of designing MCMC al- 
gorithms that take advantage of the special struc- 
ture of Bayesian GLMMs that is summarized in Sec- 
tion 2. We have done some exploration in this direc- 
tion (Zhao, 2003), but would welcome such research 
from MCMC specialists. In the meantime, use of 
WinBUGS is our recommended fitting method. 

APPENDIX: WINBUGS CODE 

In this Appendix we list the WinBUGS code used for 
the data analyses of Section 4. Note that the spline 
basis functions and hyperparameters are inputs. 



The following code was used to fit (5) to the data 
on respiratory infection of Indonesian children. Here 
inverse gamma priors are used on all variance com- 
ponents. 

model 
{ 

for (i in l:num.obs) 
{ 

X[i,l] <- age[i] 

X[i,2] <- vitAdefic [i] 

X[i,3] <- sex[i] 

X[i,4] <- height [i] 

X[i,5] <- stunted [i] 

X[i,6] <- visit2[i] 

X[i,7] <- visit3[i] 

X[i,8] <- visit4[i] 

X[i,9] <- visit5[i] 

logit (mu[i] ) 

<- gamma [sub j ect [i] ] 
+ inprod(beta[] ,X[i,] ) 
+ inprod (u . spline [] , Z . spline [i , ] ) 

resp[i] " dbern(mu[i]) 

} 

for (i.subj in l:num.subj) 
{ 

gamma [i . sub j] <- betaO + u . sub j [i . sub j ] 
u. subj [i . subj] " dnorm(0 , tau.u. subj ) 

} 

for (k in 1: num. knots) 
{ 

u. spline [k] ~ dnorm(0, tau.u. spline) 

} 

betaO ~ dnorm(0,tau.beta) 
for (j in l:num.pred) 
{ 

beta[j] " dnorm(0,tau.beta) 

} 

tau.u. spline ~ dgamma(A .u. spline , 
B.u. spline) 
tau.u. subj " dgamma(A .u. subj ,B .u. subj ) 

} 

The following code was used to fit (6) to the data on 
caregiver stress and respiratory health. This code il- 
lustrates the use of folded-Cauchy priors on variance 
components. As noted in the WinBUGS user man- 
ual (Spiegelhalter, Thomas and Best, 2000), a single 
zero Poisson observation with mean <p contributes a 
term exp(c/>) to the likelihood for a, which is then 
combined with a flat prior over the positive real line 
to produce the folded-Cauchy distribution. 

model 
{ 

for (i in l:num.obs) 
{ 

X[i,l] <- age[i] 
X[i,2] <- incomel[i] 
X[i,3] <- income2[i] 
X[i,4] <- race[i] 



16 



ZHAO, STAUDENMAYER, COULL AND WAND 



log(mu[i] ) 

<- gamma [house [i] ] 
+ inprod(beta[] ,X[i,]) 
+ inprod(u. spline [] , Z . spline [i , ] ) 

y[i] ~ dpois (mu[i]) 

} 

for (i. house in l:num. house) 
{ 

gamma [i .house] <- betaO+u. sub j [i .house] 
u. sub j [i .house] " dnorm(0 , tau.u. subj ) 

} 

for (k in l:num. knots) 
{ 

u. spline [k] " dnorm(0, tau.u. spline) 

} 

betaO ~ dnorm(0,tau.beta) 
for (j in l:num.pred) 
{ 

beta[j] ~ dnorm(0,tau.beta) 

} 

tau.u. spline <- pow(sigma.u. spline , -2) 

zero .u. spline <- 

sigma.u. spline " dunif (0 , 1000) 

phi. u. spline <- log( (pow(sigma.u. spline , 2) 

+ pow (phi . scale .u. spline ,2) ) ) 
zero .u. spline " dpois (phi .u. spline) 
tau.u. subj <- pow(sigma.u. subj ,-2) 
zero. u. subj <- 
sigma.u. subj " dunif (0 , 1000) 
phi . u . subj 

< - log( (pow(sigma.u. subj , 2) 

+ pow (phi . scale .u. subj , 2) ) ) 
zero. u. subj " dpois (phi .u. subj ) 

} 

Below is the code that we used to fit the spatial 
model (7) to the Cape Cod female lung cancer data. 
Please note that the variance components have in- 
verse gamma priors, and adj, weights, and num are 
inputs to car. normal, the normal conditional au- 
toregressive function in WinBUGS. 
model 
{ 

for (i in 1 : num. regions) 
{ 

X[i,l] <- working [i , 1] 
X[i,2] <- distanced, 1] 
theta[i] <- betaO+u. spatial [i] 

+ inprod(beta[] ,X[i,] ) 

+ inprod(u. spline [] , 

Z . spline [i ,] ) 
log(mu[i]) <- log(E[i] )+theta[i] 
[i] " dpois (mu[i]) 
SIRhat[i] <- 100*mu[i]/E[i] 

} 

u. spatial [1 :num. regions] 

~car. normal (adj [] , weights [] , 

num[] , tau.u. spatial) 
for (k in 1: num. knots) 
{ 



u.spline[k] " dnorm(0 . , tau.u. spline) 

} 

for (j in l:num.pred) 
{ 

beta[j] " dnorm(0.0,tau.beta) 

} 

betaO " dnorm(0 . 0, tau.beta) 
tau.u. spatial~dgamma(A.u. spatial , 
B .u. spatial) 
tau.u. spline~dgamma(A .u. spline ,B . u. spline) 

} 
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